Identifying Human Phenotype Terms by Combining Machine Learning and Validation Rules

نویسندگان

  • Manuel Lobo
  • Andre Lamurias
  • Francisco M Couto
چکیده

Named-Entity Recognition is commonly used to identify biological entities such as proteins, genes, and chemical compounds found in scientific articles. The Human Phenotype Ontology (HPO) is an ontology that provides a standardized vocabulary for phenotypic abnormalities found in human diseases. This article presents the Identifying Human Phenotypes (IHP) system, tuned to recognize HPO entities in unstructured text. IHP uses Stanford CoreNLP for text processing and applies Conditional Random Fields trained with a rich feature set, which includes linguistic, orthographic, morphologic, lexical, and context features created for the machine learning-based classifier. However, the main novelty of IHP is its validation step based on a set of carefully crafted manual rules, such as the negative connotation analysis, that combined with a dictionary can filter incorrectly identified entities, find missed entities, and combine adjacent entities. The performance of IHP was evaluated using the recently published HPO Gold Standardized Corpora (GSC), where the system Bio-LarK CR obtained the best F-measure of 0.56. IHP achieved an F-measure of 0.65 on the GSC. Due to inconsistencies found in the GSC, an extended version of the GSC was created, adding 881 entities and modifying 4 entities. IHP achieved an F-measure of 0.863 on the new GSC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explain the theoretical and practical model of automatic facade design intelligence in the process of implementing the rules and regulations of facade design and drawing

Artificial intelligence has been trying for decades to create systems with human capabilities, including human-like learning; Therefore, the purpose of this study is to discover how to use this field in the process of learning facade design, specifically learning the rules and standards and national regulations related to the design of facades of residential buildings by machine with a machine ...

متن کامل

Identification Psychological Disorders Based on Data in Virtual Environments Using Machine Learning

Introduction: Psychological disorders is one of the most problematic and important issue in today's society. Early prognosis of these disorders matters because receiving professional help at the appropriate time could improve the quality of life of these patients. Recently, researches use social media as a form of new tools in identifying psychological disorder. It seems that through the use of...

متن کامل

Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses

Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype a...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data

The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitates initial use of empiric (frequently broad-spectrum) antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2017  شماره 

صفحات  -

تاریخ انتشار 2017